Spline regression for zero-inflated models 
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Abstract 

We propose a regression model for count data when the classical GLM 
approach is too rigid due to a high outcome of zero counts and a nonlinear 
influence of continuous covariates. Zero-Inflation is applied to take into 
account the presence of excess zeros with separate link functions for the 
zero and the nonzero component. Nonlinearity in covariates is captured 
by spline functions based on B-splines. Our algorithm relies on maximum- 
likelihood estimation and allows for adaptive box-constrained knots, thus 
improving the goodness of the spline fit and allowing for detection of 
sensitivity changepoints. A simulation study substantiates the numerical 
stability of the algorithm to infer such models. The AIC criterion is shown 
to serve well for model selection, in particular if non-linearities are weak 
such that BIC tends to overly simplistic models. We fit the introduced 
models to real data of children's dental sanity, linking caries counts with 
the so-called Body-Mass-Index (BMI) and other socio-economic factors. 
This reveals a puzzling non-monotonic influence of BMI on caries counts 
which is yet to be explained by clinical experts. 

Keywords: B-splines; count data; DMFS index; nonlinear regression; overdis- 
persion; Zero Inflation. 



1 Introduction 

Classical log-linear regression for Poisson count data results in an inadequate fit 
to data if at least one of the two basic assumptions is violated: First, even condi- 
tionally on the covariates, the counted events may not arise independently and 
with identical distribution. In real data, this typically leads to overdispersion, 
breaking the mean- variance equality of Poisson counts. Second, the assumption 
of a linear influence of covariates on the log-transformed expectation may be 
too rigid. In fact, for some applications the detection of changes in sensitivity 
with respect to a covariate is fundamental. 

In this work, we tackle a particular case of the first problem where overdis- 
persion is (partly) due to an excess of zero counts. We assume a binary random 
effect which determines affiliation of observations to a class of structural zeros 
or alternatively to the class of ordinary counts. Therefore, we use Zero Inflation 
(ZI) which proposes a mixture of a Dirac mass in with the respective count 



distribution, see [T5]. Switching from Poisson to a negative binomial distribu- 
tion can take account of additional overdispersion in the ordinary component. 
Applications of ZI regression are numerous in the medical context, e.g. in pub- 
lic health studies like [T] and [2J. Other, more uncommon applications are e.g. 
software fault prediction in [TT] and patent outsourcing rates in [3]. 

To deal with the second problem of non-linearities, we resort to B-splines 
(cf. [4]). Spline-based approaches have become common for regression models 
in statistics, see the presentation 5 on spline smoothing and non-parametric 
regression, and the Multivariate Adaptive Regression Spline technique (MARS) 
used for normal residuals by [7^. Whereas smoothing spline methods use the 
set of data points as spline knots and impose a smoothness penalty, regression 
spline methods tend to favor a small number of spline knots. For the latter 
methods, variable knots may be free knots of parametric nature or knots that 
are chosen by an adaptive method. The smoothness penalty introduced in the 
intermediate form of penalized regression splines allows to use a higher number 
of knots without overfitting. In summary, splines offer a semi-parametric setting 
such that the effective model dimension remains controllable 

Our approach replaces a linear regression coefficient by a spline curve deter- 
mined by its degree of curvature, its knots and the coefficients of the resulting 
B-spline basis. A variant of the Evolutive Boundary Knots approach (EBOK, 
[13] ) is applied in order to use adaptive knots in the numerical calculation of 
the maximum likelihood estimator. 

The introduced models for count data regression unite interpretability of 
parameters and high flexibility with respect to non-standard behavior, but avoid 
numerical complications by constraining knots. Moreover, iterating the EBOK 
method with different initial constraints leads asymptotically to globally optimal 
knots (see [13] )■ This class of models can be classified among the large class of 
Generalized Additive Models (GAM; [9]). 

In the remainder of the paper, we first present details of the modelisation in 
the following section. Issues and the algorithm related to the spline modeling 
approach are discussed in Section [3J followed by remarks on model selection 
criteria in Section 0J The simulation study in Section [5] sheds light on model 
choice and goodness-of-fit issues, with particular focus on the selection of a 
spline space. We apply the presented models to caries count data of 12-year 
old French children in Section [6j with the Body-Mass- Index as potentially non- 
linear continuous covariate. Open issues and further possible developments are 
subject of Section [Jj 

2 The Model 

The Zero-Inflated Poisson Process ZIP(/^,7r) with parameters /x > 0,7r E [0, 1) 
has probability mass function (pmf) 

7r + (1 — 7r) exp(— (jl) k = 
(l-Tr)^exp(-M) * = 1,2,... 



The Zero-Inflated Negative Binomial Process ZINB(/i, v, it) with expectation 
/i > 0, dispersion parameter v > and 7r £ [0, 1) has pmf 







fcirH (/x+i>) fc -t 



fc = l,2, 



We denote univariate B-splines by Ni t d( -,A),i — l,...,m + d+l, with a grid A 
of knots on a finite interval [a, 6], boundary knots a, 6 and the degree d of spline 
smoothness. The degree d — 1 corresponds to linear splines and d = 3 to cubic 
splines. 

We follow Lambert's model of two regression equations as presented in [T1Z] . 
allowing covariates to influence on the probability of a structural zero and on 
the count expectation. The model for n observations Yi is 

Yi | (Xi,Zi) = (x ; ,Zi) ~ (^(xi),7r(zi)) independent 

log^(xi) =.g c (xi,/3 c ) 

logit7r(z i )-.^(z i ,/3 z ) 

The covariate vectors Xi, Zi may coincide, e.g. if each component depends on 
all available covariates. Dependence on covariates can be modeled either by a 
linear regression coefficient or by a spline function for continuous covariates. 
I.e., 

3=1 3 = 1 

where for rod £ {c, z} either 

gf d {u i p j ) = up j 
or 

i 
If splines are applied for m > 1 covariates x. t j,j £ J, we remove m — 1 of the 
terms B, Ni t d(u, Aj), j € J, to ensure an identifiable model: The "partition of 
l"-property Y,Nij = 1 gives 

l !>1 

so we combine the to constants 5j ' into one single coefficient 8) ' = B- , and 

^3 ° ^3o r 3o ' 

for the other coefficients we replace B ■ by /3 ■ = K P) ■ If we apply the 
linear modelisation instead of a spline curve in one of the components, as usual 
a regression constant can be added. 

For cubic splines, we can optionally impose that second-order derivatives of 
a spline curve are zero at the boundary knots a, b of the respective covariate's 



domain - resulting in a natural cubic spline. This counteracts the strong near- 
boundary variability of non-linear spline curves and reduces the number of free 
B-spline coefficients by 2. The spline model is flexible in the sense that other 
algorithmically tractable constraints on the form of the spline curve could be 
imposed. 

The (semi-)parametric nature of the model with coefficients and knots results 
in models that are flexible and remain conveniently interpretable and compara- 
ble, see also the remarks on model selection later on. Quantities like derivatives 
are analytically accessible. 

We fit this model by maximum- likelihood estimation of the /3-coefficients, 
of knot locations A if considered as variable and of additional parameters that 
are constant for all observations, e.g. the extent of overdispersion v of ZINB. 
Other links than the proposed ones may be chosen if convenient. The proposed 
model is of additive type, but extensions with more refined multivariate spline 
techniques can deal with interaction effects of covariates. 

3 Optimal Knot Locations 

In general, the set-up of fixed knots is an arbitrary restriction of the set of avail- 
able spline curves. In particular, the B-spline basis may contain elements that 
merely improve the fit and model dimension is blown up by insignificant param- 
eters. For instance, think of an equidistant grid of knots which performs poorly 
if there are large regions of sparse observations in the covariate domain. How- 
ever, optimizing knot locations makes numerical optimization more challenging 
for the following reasons: 

• The number of free parameters in "good" models tends to increase. 



• 



• 



Analytic partial dcrivates with respect to knots are difficult to establish; 
hence we use numerical derivatives. 

Knots influence differently on the likelihood in comparison with /^-coefficients. 

In particular, many suboptimal stationary points which correspond to 
coinciding knots may arise (see [TU] for this lethargy property), rendering 
non-constrained knot optimization infeasible. 

We use an Evolutive Bounded Optimal Knots algorithm (see [13]) that im- 
poses iteratively adaptable interval constraints on variable knots, based on a 
partition of a covariate domain [a, b]. In general, the resulting optimized knots 
are not globally optimal, but iterating the procedure with different initial inter- 
val constraints would lead to convergence to the optimal set of knots. To avoid 
coinciding knots and the associated discontinuities in spline functions, a minimal 
positive distance between two knots can be defined. In the following, we present 
and apply the algorithm for only one spline-modeled covariate; the extension to 
an additive model as introduced in Section [5] is straightforward. However, for 
the inclusion of interaction terms in other multivariate spline approaches, more 



complex adaptation steps become an issue. We further exclude iteration with 
different initial knot constraints and propose instead initial knots corresponding 
to equiprobable quantiles such that segments between knots contain the same 
number of observations. We shortly outline the functioning of the algorithm: 

• Initial knots arc determined according to equiprobable quantiles of the 
covariate. 

• Each knot can vary within the boundaries of an interval, also called its box. 
Boxes for different knots do not overlap, and usually we separate boxes 
by a small distance to avoid exactly coinciding knots. Initial boundaries 
are fixed to the center of the segment between two initial knots. 

• We determine initial B-spline coefficients and other estimated parameters 
by numerical maximisation of the likelihood with the fixed initial knots. 

• We maximize likelihood numerically with respect to box-constrained knots 
with the initial parameter estimates as starting point. 

• If one of the knots coincides with a boundary of its box, we shift this 
boundary such that it is in the middle between this knot and its neigh- 
boring knot. The adjacent boundary of the box of the neighboring knot 
is shifted accordingly such that boxes do not overlap and the separating 
distance is preserved. 

• If one of the knots coincides with a boundary, but the neighboring knot is 
too close to shift this boundary, we do nothing. Usually this means that 
there is a sharp break in the curvature of the fitted spline curve. 

• If box boundaries have been shifted, we iterate the procedure with the 
adapted box configuration. 

• If no boundaries are to be shifted, we stop the iteration, and the last set 
of knots and estimated parameters defines the fitted model. 

4 Model selection 

We consider AIC and BIC for model selection. Moreover, (cross-validation) 
residuals allow us to evaluate a model's predictive power. Difficulty lies in de- 
termining the dimension of spline models. With a parametric interpretation 
of spline coefficients and knots, we count each estimated scalar as a model pa- 
rameter. So for each spline curve included in the model, we add to the model 
dimension the sum to + d + 1 + m/ of the number to of inner knots, of the 
spline order d + 1 (2 for linear splines, 4 for cubic splines) and of the number 
m,f < m of free inner knots. This model dimension corresponds to d + 1 ini- 
tial parameters associated to the left endpoint a of the covariate interval [a, b] , 
to to change-points of which to/ are free parameters, and to the to respective 
changes in the (d— l)th derivative. For instance, a linear free-knot spline s with 



d = 1 possesses an initial level s(a), an initial sensitivity s'(a) and m = m/ 
breakpoints with the respective sensitivity changes. If natural cubic splines are 
applied, the model dimension is reduced by 2. 

Spline spaces include all linear functions, but the penalisation of higher model 
dimension would favor the simpler models with a linear coefficient when non- 
linear effects are not strong. If the sample is large, BIC penalizes strongly in 
the semi-parametric setting of splines which may include a rather high number 
of parameters compared to purely parametric models. This seems dispropor- 
tionate such that we prefer AIC, a choice which is substantiated through the 
simulation study in the following section. In general, one should be cautious 
when comparing models with different nature of the respective parameters. In 
the literature, no consensus is yet found on how to assess the effective model 
dimension when splines are involved. Further model checks can be based on a 
more profound analysis of residuals. 

In the application to dental sanity data we opted for a preselection of the 
models performing best in terms of AIC among a wide variety of available 
models. To obtain our designated "winner model" and to elude selection bias, we 
checked these models' predictive power by minimizing a cross-validation mean 
residual error. 

5 A simulation study 

We focus on two issues: 



• 



• 



Study (1): Retrieve the B-splinc coefficients of a simulated model when 
the spline space is known. 

Study (2): Find the best-fitting spline model when the simulated model 
is not equivalent to a spline model. 



In each case, we use ZIP regression. We assess the (relative) goodness-of-fit for 
different models based on AIC, BIC and cross-validation mean residual error 
with respect to raw residuals (MRE). We randomly choose 20 non-overlapping 
subsamplcs of equal size for cross-validation to avoid the costly leave-one-out 
procedure. Moreover, we apply the supremum- and i^-norms to measure the 
deviation of fitted spline curves from the simulated ones. 

In Study (1), we restrict the analysis to the more intricate cubic splines 
which usually lead to stronger variability in fitted curves than linear splines, 
particularly near the boundaries. We generate 100 simulations of the model 

X t ~ C/[0,1], Y l \X l =x l ~ ZIP (log' 1 f c (xi), logit^Mxi)) , (1) 

with i = 1, ..., 200, where f c is a cubic spline 
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Figure 1: Spline curves for simulated models of Study (1). 



with marginal knots and 1 and with 2 equidistant interior knots 1/3 and 2/3, 
and 



The coefficients af3i are chosen such that 



1 



A 



(-1)' (t = l,...,6) 



and a £ {0.5, 1,2, 3} determines how strongly the spline function oscillates, see 
Figure [T] Tables Q] to |4] summarize for the four given values of a the goodness- 
of-fit statistics for linear fits and cubic spline fits with up to three cquistant 
fixed knots in the count component. We show median values and standard 
deviations over the 100 simulations for the proposed criteria (note that the 
standard deviation coincides for AIC and BIC). Additionally, for AIC, BIC and 
MRE, the number of times a model performed best is given. Median values 
and standard deviations for the estimated regression constants and coefficients 
of the zero component are also included. 

The case a — 0.5 with the weakest oscillation is best fitted with a linear 
model. Even the norm criteria tend to favor the linear model. In general, 
curves that are "close" to linear are better fitted by linear models, since higher 
model dimension and increased variability make the spine fit less appealing. 
With a increasing, the actual model is better reconstructed. 

AIC, MRE and the T^-norm show similar preferences, whereas BIC has a 
stronger preference for the simpler linear model. The fitted linear coefficients 
of the zero component show some bias in some cases, but no other anomalies. 

In Study (2), we keep the model from Study (1) with 50 simulations, where 
we change f c to the non-spline function 

f (x) = 1 + \fx + sin(4-7r.x), x G [0, 1] , 

which describes a periodic phenomenon sin(47ra) with a nondinear trend 1 + \fx. 



Wc fit linear splines and cubic splines with fixed and adaptive knots (EBOK 
method). Tables [5] to [8] show the results. MRE preference is scattered over 
the range of models such that there is no clear preference. We always find a 
good correspondance between AIC with Z^-norm and partly with the supremum 
norm, so AIC serves well in finding the best model. BIC tends to simpler models. 
Variable knots tend to improve the examined criteria, and less knots are needed 
to obtain the optimal value of a criterion. However, an improvement in best 
attainable BIC values seems questionable when switching to variable knots. 

We conclude that, in practice, the presented class of ZI spline regression 
models allows for fitting to data with the proposed fitting algorithm which is 
sufficiently robust. BIC works well if non-linearities are strong, in other cases 
we advise AIC, which can be complemented by the analysis of residuals, in 
particular of MRE. 

6 Application to dental sanity data 

Finding appropriate caries prediction models is a recurring subject in odontol- 
ogy, see the review of [14] and the meta-analysis of 8] concerning risk factors for 
dental caries in young children. A commonly accepted quantification of caries 
incidence for an individual is the DMFS index, i.e. the count of decayed, miss- 
ing and filled teeth surfaces. We have at our disposal a data set for 768 French 
children aged 12 years. We use the variables DMFS index, weight, height, school 
type (public vs. private), sugar consumption (low vs. high) and consumption of 
sweetened drinks (low vs. high). An important quantity with respect to an indi- 
vidual's health is the so-called Body Mass Index BMI. It is obtained by dividing 
the weight (in kg) by the squared height (in m), thus defining a continuous scale 
ranging from underweight to obesity (cf. [6). A discussion of the association 
between caries counts and BMI in a more traditional statistical context is given 
in [15] . There the question was raised if non-linear regression techniques could 
better identify a possible dependence on BMI, since no definite conclusion on 
significance of BMI could be drawn. Figure [2] shows a scatterplot of DMFS and 
BMI data, including a kernel density estimate of BMI and a mobile mean of 
DMFS with respect to BMI. The mobile mean is calculated over a neighbor- 
hood of 25 BMI values at each side. Naively, one might expect either no effect or 
e.g. a monotonic increase of mean DMFS values with increasing BMI. Yet the 
mobile mean shows two peaks and a dent in-between in a region of high density 
of BMI observations, with an overall slightly increasing trend. Moreover, zero 
observations are abundant in DMFS values. The ZI spline regression models are 
therefore appropriate candidates for a formal modelisation of these data. We 
fitted a wide variety of models determined by all combinations of the following 
options: 

• with vs. without ZI 

• BMI: linear in the count or the structural zero component or in both 

— regression constant only vs. regression constant and coefficient 
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Figure 2: Left: Scatterplot of DMFS and BMI data. Right: DMFS frequencies. 



• with vs. without a coefficient for school type 

• with vs. without a coefficient for sugar consumption 

• with vs. without a coefficient for consumption of sweetened drinks 

• spline regression 

— adaptive knots vs. fixed equiprobablc quantile knots 

— linear vs. cubic vs. natural cubic splines 

For spline curves, we considered at least one knot and a maximum number of 
knots such that the optimal AIC values could be identified. Since we expect non- 
linear effects to be rather weak, we neglect BIC in accordance with the results 
of the preceding simulation study. We restrain the following analysis to finding 
a single model that fits well. Depending on what we want to know about the 
process that generates the data, subsets of the considered model classes may 
be of interest - e.g. linear variable knot models to identify breakpoints. We 
summarize some noticeable caracteristics of the fitted models: 

Negative binomial models are clearly superior to Poisson models. So even after 
including ZI into the model, there remains overdispersion in the count 
component. 

A linear spline model with ZI and 3 fixed interior knots takes the top position 
with respect to AIC. 





Figure 3: Winner model. Includes the factor for the consumption of sweetened 
drinks and 4 (almost coinciding) knots. The rug indicates the observed DMFS 
data. 



The factor covariates turn up in the top AIC ranks. In particular, the con- 
sumption of sweetened drinks is included in the two top models. 

We mention that the strong penalization of BIC favors simple linear models 
without 71. In fact, the nonconditional model with no influence of the 
covariates turns out best. 

We found that mean residual errors with leave-one-out cross-validation (MRE) 
for the top 20 AIC models vary little between 2.410 and 2.433. We look closer 
at the top AIC model and the best model with respect to MRE which we choose 
as " winner" , see Figures [3] and 2] for predictions of structural zero probability 
and mean count \i. The zero probabilities do not depend on BMI. In both 
models, the factor for high consumption of sweetened drinks is included and 
increases the count expectation clearly. Curiously, the probability of structural 
zeros increases slightly with high consumption. We conjecture that actually 
there is no effect of this factor on structural zeros, i.e., that the corresponding 
coefficient is 0. The winner model is the only variable knot model among the 
20 preselected models. The plot for the structural zero probability is omitted 
since it is virtually identical to the one in Figure |H The model applies natu- 
ral splines and 4 knots that almost coincide, with e = 0.001 x range(BMI) as 
minimal knot distance in the EBOK algorithm. There is an abrupt jump in the 
predicted values around the interior knot locations. In summary, both models 
reproduce the two peaks and a dent in-between, as revealed by the application 
of a mobile mean. The winner model's abrupt jump in the dent is even more 
baffling. An explanation for this interesting finding has yet to be established by 
odontological researchers. 
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Figure 4: Best AIC model. With factor for consumption of sweetened drinks 
and 3 (fixed) knots. The rug indicates the observed DMFS data. 

7 Conclusion and outlook 

We presented a class of models that provides high flexibility for regression anal- 
ysis of count data. In particular, we allow for a binary effect that decides on 
membership to structural zero observations. The spline approach serves to iden- 
tify non-linear relations. This is of great importance in fields like medicine or 
disease control in order to correctly assess the influence of risk factors. When 
computational complexity of variable knots is manageable, the goodness of the 
spline fit tends to improve even with less knots than in the fixed knot scenario. 
A refinement of model choice may prevent a biased selection of the winner model 
which is due to the usually large number of eligible models. In general, mul- 
tiple tests and the comparison of different goodncss-of-fit criteria help improve 
the model selection procedure. Criteria for high-dimensional model spaces like 
the slope heuristic could be considered. Since any parametric linear model is 
equivalent to a semi-parametric spline model, the decision if a spline model 
fits better could be based on procedures operating only on spline model fits. 
Then the pooling of parametric and semi-parametric models, as done in the 
presented application, can be avoided for model selection. We showed the util- 
ity of such model fitting in an application to dental sanity data where classical 
GLM models and ZI-GLM models failed to reveal a significant influence of the 
Body-Mass-Index on caries counts. 
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Tabic 1: Study (1): Linear fit and cubic spline fits for fOO simulations. 



a = 0.5 



lin. 



f 



ll-lloo 


best 


37 


25 


16 


22 




med. 


0.4787 


0.4796 


0.4633 


0.4653 


Mli 


sd 
best 


(0.2536) 
89 


(0.3781) 
5 


(0.4012) 
5 


(0.4441) 
1 




med. 


0.1928 


0.2971 


0.3317 


0.364 




sd 


(0.1147) 


(0.1243) 


(0.126) 


(0.1495) 


MRE 


best 


35 


28 


21 


16 




med. 


0.5871 


0.5801 


0.5809 


0.5787 




sd 


(0.07391) 


(0.07362) 


(0.07337) 


(0.07315) 


AIC 


best 


71 


16 


6 


7 




med. 


330.3 


330.5 


331.9 


331.3 


BIC 


sd 
best 


(32.16) 
100 


(31.72) 



(31.83) 



(31.93) 





med. 


343.5 


353.6 


358.3 


361 


ft* 


med. 


0.9773 


0.8133 


0.7484 


0.7664 


ft 


sd 
med. 


(0.6067) 
-0.8938 


(0.6447) 
-0.7376 


(0.6587) 
-0.7058 


(0.6576) 
-0.7598 




sd 


(1.098) 


(1.667) 


(1.688) 


(1.658) 



Table 2: Study (1): Linear fit and cubic spline fits for 100 simulations. 



a = 


= i 


lin. 


1 


2 


3 


ll-lloo 


best 


12 


15 


48 


25 




med. 


1.001 


0.8161 


0.5528 


0.6432 




sd 


(0.3142) 


(0.451) 


(0.4037) 


(0.6481) 


Mli 


best 


64 


9 


20 


7 




med. 


0.2542 


0.3256 


0.3409 


0.3907 




sd 


(0.1047) 


(0.1157) 


(0.1447) 


(0.1531) 


MRE 


best 


25 


20 


35 


20 




med. 


0.5982 


0.5989 


0.5921 


0.5956 




sd 


(0.06855) 


(0.06817) 


(0.06734) 


(0.06885) 


AIC 


best 


46 


12 


28 


14 




med. 


329.7 


329.6 


328.1 


329.3 




sd 


(29.71) 


(29.41) 


(29.32) 


(28.99) 


BIC 


best 


98 





2 







med. 


342.9 


352.7 


354.5 


358.9 


ft 


med. 


0.9504 


0.8906 


0.771 


0.7398 




sd 


(0.6368) 


(0.5333) 


(0.5237) 


(0.5268) 


ft 


med. 


-0.7811 


-0.6593 


-0.6756 


-0.6786 




sd 


(1.279) 


(1.092) 


(1.079) 


(1.096) 
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Tabic 3: Study (1): Linear fit and cubic spline fits for 100 simulations. 



lin. 



Mice 


best 


2 


2 


59 


37 




med. 


1.721 


1.454 


0.5498 


0.6712 




sd 


(0.4325) 


(0.7486) 


(0.4782) 


(0.6693) 


Mli 


best 


16 


6 


61 


17 




med. 


0.4552 


0.4504 


0.347 


0.3867 




sd 


(0.08129) 


(0.1226) 


(0.1642) 


(0.1893) 


MRE 


best 


22 


7 


38 


33 




med. 


0.6035 


0.6115 


0.5969 


0.6 




sd 


(0.08173) 


(0.07994) 


(0.07858) 


(127.2) 


AIC 


best 


12 


6 


59 


23 




med. 


323.3 


322.1 


316.6 


316.6 




sd 


(32.37) 


(30.98) 


(29.35) 


(29.68) 


BIC 


best 


73 


2 


25 







med. 


336.5 


345.2 


343 


346.3 


ft 


med. 


1.255 


1.08 


0.9983 


0.972 




sd 


(0.5357) 


(1.001) 


(0.523) 


(0.5195) 


ft 


med. 


-1.269 


-0.952 


-1.023 


-1.057 




sd 


(1.122) 


(1.46) 


(1.126) 


(1.108) 



Table 4: Study (1): Linear fit and cubic spline fits for 100 simulations. 



a = 3 



lin. 



1 



II -Hoc 


best 








63 


37 




med. 


2.525 


2.569 


0.5505 


0.5438 


II -111 


sd 
best 


(0.4793) 
2 


(1.237) 
1 


(0.4578) 
85 


(0.3987) 
12 




med. 


0.7175 


0.6292 


0.2804 


0.3449 




sd 


(0.1066) 


(0.1043) 


(0.162) 


(0.1786) 


MRE 


best 


20 


4 


39 


37 




med. 


0.7665 


0.7564 


0.7341 


0.7323 




sd 


(0.1283) 


(0.1868) 


(0.1112) 


(0.1171) 


AIC 


best 


2 


1 


71 


26 




med. 


363.3 


352.9 


335.8 


338.6 


BIC 


sd 
best 


(42.61) 
19 


(35.84) 
1 


(31.55) 
77 


(31.56) 
3 




med. 


376.5 


376 


362.2 


368.3 


ft 


med. 


1.519 


1.083 


0.9298 


0.8992 


ft 


sd 
med. 


(0.6304) 
-1.662 


(0.5472) 
-0.8224 


(0.505) 
-1.091 


(0.4898) 
-1.034 




sd 


(2.16) 


(1.358) 


(1.017) 


(0.9697) 



15 



Table 5: Study (2): Linear fit and linear spline fits for 50 simulations. 



fixed knots lin 



1 



II • Hoc 


best 














2 


12 


36 




med. 


1.321 


1.429 


1.303 


1.254 


0.7911 


0.6064 


0.4843 




sd 


(0.08845) 


(0.1048) 


(0.2739) 


(0.3453) 


(0.3474) 


(0.2732) 


(0.2471) 


Mli 


best 














1 


13 


36 




med. 


0.5916 


0.588 


0.4133 


0.5209 


0.2603 


0.1994 


0.1736 




sd 


(0.02193) 


(0.02282) 


(0.03472) 


(0.03662) 


(0.03817) 


(0.0459) 


(0.03317) 


MRE 


best 


5 


1 


6 


1 


10 


14 


13 




med. 


3.23 


3.225 


3.099 


3.171 


3.104 


3.083 


3.083 




sd 


(0.3565) 


(0.3487) 


(0.2872) 


(0.3025) 


(0.2665) 


(0.2525) 


(0.2414) 


AIC 


best 














4 


16 


30 




med. 


760.5 


756.9 


660.8 


697.6 


616.2 


600.7 


602.2 




sd 


(69.02) 


(66.07) 


(52.34) 


(49.92) 


(45.74) 


(40.39) 


(39.84) 


BIC 


best 














11 


20 


19 




med. 


773.6 


773.4 


680.6 


720.7 


642.6 


630.4 


635.1 



Table 6: Study (2): Linear fit and linear spline fits for 50 simulations. 



var. knots 1 



II 'M 


best 








1 


16 


15 


18 




med. 


1.387 


1.286 


1.14 


0.4642 


0.442 


0.4773 




sd 


(0.08547) 


(0.3015) 


(0.3556) 


(0.2394) 


(0.2146) 


(0.2337) 


Mli 


best 











24 


15 


11 




med. 


0.5387 


0.3356 


0.2626 


0.163 


0.1871 


0.1849 




sd 


(0.01319) 


(0.02739) 


(0.06051) 


(0.04077) 


(0.04256) 


(0.05947) 


MRE 


best 


10 


6 


12 


9 


3 


10 




med. 


3.189 


3.104 


3.12 


3.118 


3.099 


3.086 




sd 


(0.3405) 


(0.2687) 


(0.248) 


(0.2352) 


(0.2144) 


(0.2611) 


AIC 


best 


1 


3 


5 


18 


14 


9 




med. 


703.6 


639.7 


621.5 


593 


593.5 


598.3 




sd 


(57.8) 


(45.73) 


(43) 


(39.36) 


(35.93) 


(39.84) 


BIC 


best 


2 


3 


8 


21 


12 


4 




med. 


723.4 


666.1 


654.5 


632.6 


639.6 


651 
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Tabic 7: Study (2): Cubic spline fits for 50 simulations. 



fixed knots 1 



II • Hoc 


best 





15 


1 


14 


11 


9 




med. 


1.309 


0.3749 


0.5663 


0.3207 


0.3449 


0.3494 




sd 


(0.7008) 


(0.2088) 


(0.35) 


(0.2479) 


(0.3341) 


(0.3235) 


Mli 


best 





11 





14 


15 


10 




med. 


0.5636 


0.1609 


0.2504 


0.1347 


0.1353 


0.1446 




sd 


(0.02085) 


(0.03708) 


(0.0466) 


(0.04189) 


(0.0518) 


(0.05427) 


MRE 


best 


8 


14 


6 


7 


9 


6 




med. 


3.238 


3.103 


3.154 


3.101 


3.088 


3.108 




sd 


(0.3993) 


(0.2511) 


(0.2568) 


(0.2449) 


(0.2374) 


(0.2364) 


AIC 


best 





12 





23 


11 


4 




med. 


714.2 


598.5 


616.2 


595.9 


595.9 


599.2 




sd 


(57.14) 


(42.12) 


(39.69) 


(39.18) 


(38.6) 


(38.8) 


BIC 


best 





37 


1 


11 


1 







med. 


737.3 


624.8 


645.9 


628.9 


632.2 


638.8 



Tabic 8: Study (2): Cubic spline fits for 50 simulations. 



var. 


knots 


1 


2 


3 


4 


5 


6 


Il-lloo 


best 





29 


8 


4 


4 


5 




med. 


1.357 


0.2841 


0.348 


0.4517 


0.4851 


0.6122 




sd 


(0.7506) 


(0.1916) 


(1.514) 


(1.926) 


(2.918) 


(1.886) 


Mli 


best 





32 


10 


3 


4 


1 




med. 


0.4675 


0.126 


0.1382 


0.1713 


0.1883 


0.2139 




sd 


(0.02244) 


(0.04604) 


(0.4086) 


(0.2071) 


(0.13) 


(0.5814) 


MRE 


best 


6 


14 


5 


7 


7 


11 




med. 


3.202 


3.107 


3.112 


3.135 


3.126 


3.121 




sd 


(0.3382) 


(0.2447) 


(0.4305) 


(2.431) 


(28.84) 


(2285) 


AIC 


best 





31 


7 


3 


1 


8 




med. 


692 


598.6 


599.9 


601.8 


602.1 


601.3 




sd 


(50.82) 


(39.37) 


(39.02) 


(36.1) 


(39.31) 


(39.43) 


BIC 


best 





41 


1 


1 





7 




med. 


718.4 


631.6 


639.5 


647.9 


654.9 


660.7 
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